Composition Decomposition LinearizationParsing Extraction Statistical Language Lexicon Source Language Lexicon Target Analysis Generation Realization Lexical Selection LCS Parse Word Lattice AMR Target

نویسنده

  • Nizar Habash
چکیده

This paper describes a large-scale language-independent evaluation of the use of Thematic Hierarchies in natural language generation. We translate from a corpus of sentences reeecting the full variety of behavior of Levin-based verb classes. The corpus is used as input to a generation system that utilizes the same thematic hierarchy for realizing relative argument surface positions in two languages: English and Spanish. The output was manually evaluated by English and Spanish speakers. The contributions of this work include: (1) an improved thematic hierarchy over an earlier implementation; (2) a large-scale evaluation of the use of thematic hierarchies in two languages; (3) an implementation of a language independent module for natural language generation; and (4) the creation of a single tool for incremental development of multilingual lexicons. 1 Motivation In (Dorr et al., 1998), an implementationof thematic hierarchies for eecient natural language generation was presented. The use of the thematic hierarchy was evaluated using a small hand-constructed corpus of 100 English sentences reeecting a variety of English verb classes and alternations. The hierarchy was implemented using cascading rules within the grammar formalism provided as part of the natural language realization engine Nitrogen (Langkilde and Knight, 1998a; Langkilde and Knight, 1998b). Some of the shortcomings of this earlier work are: (1) inadequate evaluation due to the use of a small test corpus; (2) limitation of the approach to one language only (English); (3) lack of a principled design in the implementation. This paper presents more systematic implementation of thematic hierarchies and a large-scale evaluation of their use for generation in English and Spanish. This evaluation was helpful in incremen-tal development of both the thematic hierarchy and the English and Spanish lexicons. The work presented here is part of the generation component (Traum and Habash, 2000) of the inter-lingual Machine Translation eeort at the University of Maryland College Park. The generation component has also been used in Cross-Language Information Retrieval research (Levow et al., 2000). The interlingual representation used is Lexical Conceptual Structure (LCS),a compositional abstraction with language-independent properties that tran-One of the major challenges in natural language processing is the ability to make use of existing resources. Large diierences in syntax, semantics, and ontologies of such resources create signiicant barriers to their usage in large-scale applications. A case in point is the wide range of \interlingual rep-resentations" used in machine translation and cross-language processing. Such representations are becoming increasingly prevalent, yet views …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Thematic Hierarchy for Eecient Generation from Lexical-conceptual Structure

This paper describes an implemented algorithm for syntactic realization of a target-language sentence from an interlingual representation called Lexical Conceptual Structure (LCS). We provide a mapping between LCS thematic roles and Abstract Meaning Representation (AMR) relations; these relations serve as input to an oo-the-shelf generator (Nitrogen). There are two contributions of this work: (...

متن کامل

Word Sense Disambiguation Using a Second Language Monolingual Corpus

This paper presents a new approach for resolving lexical ambiguities in one language using statistical data from a monolingual corpus of another language. This approach exploits the differences between mappings of words to senses in different languages. The paper concentrates on the problem of target word selection in machine translation, for which the approach is directly applicable. The prese...

متن کامل

Transmuter: An Approach to Rule-based English to Marathi Machine Translation

This paper describes the architecture of a Machine Translation System with source language as English and target language as Marathi. The basic approach used for the development of this system is Rule Based Machine Translation. The basic algorithm for obtaining the correct word order in the target language was developed based on specific traversals of the parse tree. One of the special features...

متن کامل

Extending Statistical Machine Translation with Discriminative and Trigger-Based Lexicon Models

In this work, we propose two extensions of standard word lexicons in statistical machine translation: A discriminative word lexicon that uses sentence-level source information to predict the target words and a trigger-based lexicon model that extends IBM model 1 with a second trigger, allowing for a more fine-grained lexical choice of target words. The models capture dependencies that go beyond...

متن کامل

Generation Realization Lexical Selection LCS Parse Word Lattice AMR Target

This paper describes a large-scale language-independent evaluation of the use of Thematic Hierarchies in natural language generation. We translate from a corpus of sentences reeecting the full variety of behavior of Levin-based verb classes. The corpus is used as input to a generation system that utilizes the same thematic hierarchy for realizing relative argument surface positions in two langu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006